Executed: Mon Mar 27 11:39:53 2017

Duration: 5 seconds.

usALEX - Corrections - Direct excitation physical parameter

This notebook estimates direct excitation coefficient $d_T$ from μs-ALEX data.

Definitions Memo

$$n_d = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_D \, \eta_{D_{det}}^{D_{em}} \, (1-E)$$$$n_a = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_A \, \eta_{A_{det}}^{A_{em}} \, E$$$$ n_{aa} = I_{A_{ex}} \, \sigma_{A_{ex}}^A \, \phi_A \, \eta_{A_{det}}^{A_{em}}$$$$n_a^* = n_a + Lk + Dir$$

where

$$Lk = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_D \, \eta_{A_{det}}^{D_{em}} \, (1-E)$$$$Dir = I_{D_{ex}} \, \sigma_{D_{ex}}^A \, \phi_A \, \eta_{A_{det}}^{A_{em}}$$$$\gamma = \frac{\phi_A\,\eta_{D_{det}}^{A_{em}}}{\phi_D\,\eta_{D_{det}}^{D_{em}}}$$

Aim of this notebook

What is already computed?

We previously fitted the leakage and gamma coefficient from the RAW PR values for the 5 dsDNA measurements. We also fitted the direct excitation coefficient expressed (dir_ex_aa) as a function of the A-signal during A-excitation (naa). In symbols, dir_ex_aa is defined as:

$$ d_{AA} = \frac{n_{a}}{n_{aa}}$$

for a A-only population.

What we want to compute?

Alternatively, we can express the direct excitation contribution ($Dir$) as a function of the total corrected burst size:

$$ Dir = d_T\, (n_a + \gamma n_d)$$

With this definition, expressing $d_T$ as a function of the physical parameters we obtain:

$$d_T = \frac{\sigma_{D_{ex}}^A}{\sigma_{D_{ex}}^D} $$

where $\sigma_{Dex}^A$ and $\sigma_{Dex}^D$ are the absorption cross-sections of the Acceptor and Donor dye at wavelength of Donor laser.

Finally, remembering the definition of $\beta$:

$$ \beta = \frac{I_{A_{ex}}\sigma_{A_{ex}}^A}{I_{D_{ex}}\sigma_{D_{ex}}^D}$$

we can express $d_T$ as the product of $\beta$ and $d_{AA}$:

$$ d_T = \beta \, d_{AA}$$

Note that $d_T$ is a property of the Donor-Acceptor dyes pair and of the Donor excitation wavelength. As such, differently from $d_AA$, the $d_T$ coefficient is valid for the same sample in any setup using the same donor excitation wavelength, such as the single-spot μs-ALEX and the multi-spot system. Additionally, $d_T$ allows to correct for direct acceptor excitation using only donor-excitation quantities. Therefore the same correction formula can be used both in two-laser (e.g. single-spot μs-ALEX) and single-laser systems (e.g. 8-spot system).

How we compute it?

We use two different procedures both yielding an estimation of $d_T$. Except for the numerical accuracy the two procedures are equivalent.

Procedure 1: Using beta

From the previous relation between $d_T = \beta \,d_{AA}$ is possible to directly estimate $d_T$ with the values of $\beta$ and $d_{AA}$ we already fitted in previous notebooks.

Procedure 2: Correction formula

It is possible to go from the raw $E_R$ (only background correction, no leakage, direct excitation nor gamma) to the fully-corrected $E$ using the formula:

$$ E = f(E_R,\, \gamma,\, L_k,\, d_T) = \frac{E_{R} \left(L_{k} + d_T \gamma + 1\right) - L_{k} - d_T \gamma} {E_{R} \left(L_{k} - \gamma + 1\right) - L_{k} + \gamma}$$

See Derivation of FRET and S correction formulas for derivation.

We can compute the corrected $E$ for the 5 dsDNA samples by fitting the fully-corrected histograms (histograms with γ, leakage and direct excitation corrections). We can also fit the 5 $E_R$ values for the same samples from the proximity ratio histograms (only background correction).

Therefore, using the previous formula we can fit $d_T$ (dir_ex_t) by minimizing the error between the 5 $E$ values fitted from corrected histograms and the 5 $E$ values obtained correcting the 5 $E_R$ values from the fit of the proximity ratio histograms.

Loading data

We load the needed libraries and FRETBursts which includes the FRET correction formulas ($E = f(E_R,\, \gamma,\, L_k,\, d_T)$).



In [1]:

    
import os
import numpy as np
import pandas as pd
import lmfit
from fretbursts import *
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format='retina'  # for hi-dpi displays
sns.set_style('whitegrid')









    



 - Optimized (cython) burst search loaded.
 - Optimized (cython) photon counting loaded.
--------------------------------------------------------------
 You are running FRETBursts (version 0.5.9).

 If you use this software please cite the following paper:

   FRETBursts: An Open Source Toolkit for Analysis of Freely-Diffusing Single-Molecule FRET
   Ingargiola et al. (2016). http://dx.doi.org/10.1371/journal.pone.0160716 

--------------------------------------------------------------

Load Raw PR



In [2]:

    
#bsearch_ph_sel = 'AND-gate'
bsearch_ph_sel = 'Dex'

data_file = 'results/usALEX-5samples-PR-raw-%s.csv' % bsearch_ph_sel

These are the RAW proximity ratios for the 5 samples (only background correction, no leakage nor direct excitation):



In [3]:

    
data_raw = pd.read_csv(data_file).set_index('sample')
data_raw[['E_gauss_w', 'E_kde_w', 'E_gauss_w_err', 'E_gauss_w_fiterr', 'n_bursts_all', 'n_bursts_fret']]









    Out[3]:






  
    
      
      E_gauss_w
      E_kde_w
      E_gauss_w_err
      E_gauss_w_fiterr
      n_bursts_all
      n_bursts_fret
    
    
      sample
      
      
      
      
      
      
    
  
  
    
      7d
      0.927817
      0.9338
      0.002130
      0.002408
      1654
      641
    
    
      12d
      0.748896
      0.7508
      0.002467
      0.001494
      1692
      1157
    
    
      17d
      0.480670
      0.4906
      0.002001
      0.002126
      3459
      2588
    
    
      22d
      0.280392
      0.2752
      0.001454
      0.001775
      2798
      2181
    
    
      27d
      0.194729
      0.1944
      0.002032
      0.001035
      1146
      792

Columns legend

E_gauss_w, center of Gaussian peak, fitted on the weighted histograms.
E_kde_w, maximum of the KDE computed on the weighted histograms.
E_gauss_w_err is $\frac{\sigma}{\sqrt{n_{FRET}}}$, where $\sigma$ is the std.dev. of the Gaussian peak and $n_{FRET}$ is the number of bursts in the FRET population.
E_gauss_w_fiterr is the standard error returned by the non-linear least square fit. It is computed from the Jacobian taking into account the degrees of freedom.
n_bursts_all, total number of bursts
n_bursts_fret, number of bursts in the FRET population

Load Corrected E

And these are the FRET efficiencies fitted from corrected histograms for the same 5 samples:



In [4]:

    
data_file = 'results/usALEX-5samples-E-corrected-all-ph.csv'
data_corr = pd.read_csv(data_file).set_index('sample')
data_corr[['E_gauss_w', 'E_kde_w', 'E_gauss_w_err', 'E_gauss_w_fiterr', 'n_bursts_all', 'n_bursts_fret']]









    Out[4]:






  
    
      
      E_gauss_w
      E_kde_w
      E_gauss_w_err
      E_gauss_w_fiterr
      n_bursts_all
      n_bursts_fret
    
    
      sample
      
      
      
      
      
      
    
  
  
    
      7d
      0.924307
      0.9290
      0.002583
      0.001782
      1172
      542
    
    
      12d
      0.727043
      0.7398
      0.002990
      0.001995
      1307
      948
    
    
      17d
      0.422624
      0.4268
      0.002549
      0.002008
      2489
      1964
    
    
      22d
      0.181067
      0.1774
      0.001888
      0.001536
      2054
      1672
    
    
      27d
      0.083971
      0.0826
      0.002856
      0.001585
      791
      589

Columns legend

E_gauss_w, center of Gaussian peak, fitted on the weighted histograms.
E_kde_w, maximum of the KDE computed on the weighted histograms.
E_gauss_w_err is $\frac{\sigma}{\sqrt{n_{FRET}}}$, where $\sigma$ is the std.dev. of the Gaussian peak and $n_{FRET}$ is the number of bursts in the FRET population.
E_gauss_w_fiterr is the standard error returned by the non-linear least square fit. It is computed from the Jacobian taking into account the degrees of freedom.
n_bursts_all, total number of bursts
n_bursts_fret, number of bursts in the FRET population

Load SNA Data



In [5]:

    
raw_data_file_sna = 'results/alix/us-ALEX SNA Results 2016-10-12.csv'



In [6]:

    
rsna = pd.read_csv(raw_data_file_sna, index_col=0)
rsna



In [7]:

    
sna = rsna[['<E>', 'Mode']].round(4)
sna.columns = ['SNA_E_mean', 'SNA_E_max']
sna



In [8]:

    
data_file_sna = 'results/usALEX-5samples-E-SNA.csv'



In [9]:

    
sna.to_csv(data_file_sna)

Load μs-ALEX corrections



In [10]:

    
leakage_coeff_fname = 'results/usALEX - leakage coefficient DexDem.csv'
leakage = np.loadtxt(leakage_coeff_fname)

print('Leakage coefficient:', leakage)









    



Leakage coefficient: 0.10029



In [11]:

    
dir_ex_coeff_fname = 'results/usALEX - direct excitation coefficient dir_ex_aa.csv'
dir_ex_aa = np.loadtxt(dir_ex_coeff_fname)

print('Dir. excitation AA:', dir_ex_aa)









    



Dir. excitation AA: 0.06062



In [12]:

    
dir_ex_t_datasheet_fname = 'results/Dyes - ATT0647N-ATTO550 abs X-section ratio at 532nm.csv'
dir_ex_t_datasheet = np.loadtxt(dir_ex_t_datasheet_fname)

print('Direct excitation (dir_ex_t) from datasheet:', dir_ex_t_datasheet)









    



Direct excitation (dir_ex_t) from datasheet: 0.106516290727



In [13]:

    
gamma_coeff_fname = 'results/usALEX - gamma factor - all-ph.csv'
gamma = np.loadtxt(gamma_coeff_fname)

print('Gamma factor:', gamma)









    



Gamma factor: 1.020526



In [14]:

    
beta_coeff_fname = 'results/usALEX - beta factor - all-ph.csv'
beta = np.loadtxt(beta_coeff_fname)

print('Beta factor:', beta)









    



Beta factor: 0.813553

Procedure 1

Compute $d_T$ using $\beta$ and $d_{AA}$:



In [15]:

    
dir_ex_t_beta = dir_ex_aa * beta
'%.5f' % dir_ex_t_beta









    Out[15]:





'0.04932'



In [16]:

    
with open('results/usALEX - direct excitation coefficient dir_ex_t beta.csv', 'w') as f:
    f.write('%.5f' % dir_ex_t_beta)

With this coefficient, computing the corrected $E$ for the 5 dsDNA samples we obtain:



In [17]:

    
PR_corr_kde = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                leakage=leakage, 
                                                dir_ex_t=dir_ex_t_beta,
                                                gamma=gamma)*100
PR_corr_kde









    Out[17]:





sample
7d     92.873302
12d    72.773077
17d    43.140290
22d    17.621842
27d     7.807774
Name: E_kde_w, dtype: float64



In [18]:

    
PR_corr_gauss = fretmath.correct_E_gamma_leak_dir(data_raw.E_gauss_w, 
                                                  leakage=leakage, 
                                                  dir_ex_t=dir_ex_t_beta,
                                                  gamma=gamma)*100
PR_corr_gauss









    Out[18]:





sample
7d     92.225478
12d    72.560799
17d    41.983996
22d    18.247880
27d     7.848009
Name: E_gauss_w, dtype: float64

Procedure 2

Datasheet-based direct excitation

The coefficient $d_T$ can be estimated from data-sheet values of $\sigma_{D_{ex}}^A$ and $\sigma_{D_{ex}}^D$.

Using the datasheet values provided by ATTOTec (in PBS buffer) we obtain a $d_T$ estimation close to 10%:



In [19]:

    
dir_ex_t_datasheet









    Out[19]:





array(0.106516290727)

With this the corrected $E$ for the 5 dsDNA samples are:



In [20]:

    
E_datasheet = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                leakage=leakage, 
                                                dir_ex_t=dir_ex_t_datasheet,
                                                gamma=gamma)*100
E_datasheet









    Out[20]:





sample
7d     92.484823
12d    71.288927
17d    40.040846
22d    13.131377
27d     2.782340
Name: E_kde_w, dtype: float64

Comparing these values with the ones obtained fitting the corrected E histograms we observe a significant discrepancy:



In [21]:

    
out = data_corr[['E_kde_w']].copy()*100
out.columns = ['E_alex']
out['E_datasheet'] = E_datasheet
out









    Out[21]:






  
    
      
      E_alex
      E_datasheet
    
    
      sample
      
      
    
  
  
    
      7d
      92.90
      92.484823
    
    
      12d
      73.98
      71.288927
    
    
      17d
      42.68
      40.040846
    
    
      22d
      17.74
      13.131377
    
    
      27d
      8.26
      2.782340



In [22]:

    
out.plot(alpha=0.4, lw=3, style=dict(E_alex='-ob', E_datasheet='-sr'));

NOTE: The corrected FRET efficiencies using the datasheet and μs-ALEX-based direct excitation do not match well.

Fitting direct excitation $d_T$



In [23]:

    
def residuals_absolute(params, E_raw, E_ref):
    dir_ex_t = params['dir_ex_t'].value
    return E_ref - fretmath.correct_E_gamma_leak_dir(E_raw, 
                                                     leakage=leakage, 
                                                     gamma=gamma, 
                                                     dir_ex_t=dir_ex_t)



In [24]:

    
def residuals_relative(params, E_raw, E_ref):
    dir_ex_t = params['dir_ex_t'].value
    return (E_ref - fretmath.correct_E_gamma_leak_dir(E_raw, 
                                                      leakage=leakage, 
                                                      gamma=gamma, 
                                                      dir_ex_t=dir_ex_t))/E_ref



In [25]:

    
params = lmfit.Parameters()
params.add('dir_ex_t', value=0.05)



In [26]:

    
m = lmfit.minimize(residuals_absolute, params, args=(data_raw.E_kde_w, data_corr.E_kde_w))
lmfit.report_fit(m.params, show_correl=False)









    



[[Variables]]
    dir_ex_t:   0.04614939 +/- 0.004939 (10.70%) (init= 0.05)



In [27]:

    
m = lmfit.minimize(residuals_relative, params, args=(data_raw.E_kde_w, data_corr.E_kde_w))
lmfit.report_fit(m.params, show_correl=False)









    



[[Variables]]
    dir_ex_t:   0.04482660 +/- 0.001150 (2.57%) (init= 0.05)

NOTE: The fitted dir_ex_t is 4.5% as opposed to 10.6% as expected from the absorption spectra of ATTO550 and ATTO647 at 532nm.



In [28]:

    
'%.5f' % m.params['dir_ex_t'].value









    Out[28]:





'0.04483'



In [29]:

    
with open('results/usALEX - direct excitation coefficient dir_ex_t fit.csv', 'w') as f:
    f.write('%.5f' % m.params['dir_ex_t'].value)



In [30]:

    
PR_corr_kde_dfit = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                     leakage=leakage, 
                                                     dir_ex_t=m.params['dir_ex_t'].value,
                                                     gamma=gamma)*100
PR_corr_kde_dfit.name = 'PR_corr_kde_dfit'
PR_corr_kde_dfit









    Out[30]:





sample
7d     92.903804
12d    72.889605
17d    43.383644
22d    17.974412
27d     8.202347
Name: PR_corr_kde_dfit, dtype: float64

Corrected E



In [31]:

    
E = pd.concat([data_corr[['E_kde_w', 'E_gauss_w']]*100, PR_corr_kde, PR_corr_gauss, sna*100], axis=1)
E.columns = ['E KDE', 'E Gauss', 'PR KDE', 'PR Gauss', 'SNA E mean', 'SNA E max']
E



In [32]:

    
E.plot.bar(table=np.round(E, 2).T)
plt.ylabel('FRET (%)')
plt.gca().xaxis.set_visible(False)
#plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);

Estimated errors



In [33]:

    
Eerr = (pd.DataFrame().assign(
            E_gauss_w_err = data_corr.E_gauss_w_err, E_gauss_w_fiterr = data_corr.E_gauss_w_fiterr,
            PR_gauss_w_err = data_raw.E_gauss_w_err, PR_gauss_w_fiterr = data_raw.E_gauss_w_fiterr,
            SNA_SDV=rsna['SDV(E)']))*100
Eerr









    Out[33]:






  
    
      
      E_gauss_w_err
      E_gauss_w_fiterr
      PR_gauss_w_err
      PR_gauss_w_fiterr
      SNA_SDV
    
    
      sample
      
      
      
      
      
    
  
  
    
      7d
      0.2583
      0.1782
      0.2130
      0.2408
      4.50
    
    
      12d
      0.2990
      0.1995
      0.2467
      0.1494
      6.75
    
    
      17d
      0.2549
      0.2008
      0.2001
      0.2126
      8.00
    
    
      22d
      0.1888
      0.1536
      0.1454
      0.1775
      3.50
    
    
      27d
      0.2856
      0.1585
      0.2032
      0.1035
      0.00



In [34]:

    
nbursts = data_corr[['n_bursts_all', 'n_bursts_fret']]
nbursts









    Out[34]:






  
    
      
      n_bursts_all
      n_bursts_fret
    
    
      sample
      
      
    
  
  
    
      7d
      1172
      542
    
    
      12d
      1307
      948
    
    
      17d
      2489
      1964
    
    
      22d
      2054
      1672
    
    
      27d
      791
      589

Note on standard errors

The standard error for the Gausian peak fit is reported both as the error returned from the fit routine (*gauss_w_fiterr) and as $\frac{\sigma}{\sqrt{n_{FRET}}}$ (*gauss_w_err, see Columns legend above for more details). The columns starting with E_ are from fit of the corrected FRET histograms. The columns starting with PR_ are from fit of the PR histograms. In principle, the PR errors should be propagated through the correction formula, but this prpagation is not reported here (error is small).

The KDE method does not naturally provide a standard error. In principle the error can be computed using the bootstrap method, which is computationally intensive and is not reported here.

The SNA method returns a distribution of FRET efficiencies that, when shot-noise is added, best fits the experimental histogram. We report estimates of E using the max or the mean of the E distribution. We report the error as the standard deviation of the E distribution. Therefore this value is common to both E estimates (max or mean).

Comparison of different methods

A more empirical way to compute the error is comparing results of the different methods, as done below.



In [35]:

    
E[['PR KDE', 'PR Gauss', 'E KDE']].plot(kind='bar')
E[['PR KDE', 'PR Gauss', 'E KDE']].plot(lw=3);
print('Max error E_alex vs E_corr_pr: %.2f' % (E['E KDE'] - E['PR KDE']).abs().max())
print('Max error E_alex vs E_beta:    %.2f' % (E['E KDE'] - E['PR Gauss']).abs().max())
print('Max error E_beta vs E_corr_pr: %.2f' % (E['PR Gauss'] - E['PR KDE']).abs().max())









    



Max error E_alex vs E_corr_pr: 1.21
Max error E_alex vs E_beta:    1.42
Max error E_beta vs E_corr_pr: 1.16



In [36]:

    
x = [int(idx[:-1]) for idx in out.index]
plt.plot(x, 'E KDE', data=E)
plt.plot(x, 'PR KDE', data=E)
plt.plot(x, 'PR Gauss', data=E)
plt.xlabel('Distance in base-pair')
plt.ylabel('FRET');



In [37]:

    
E['E KDE'] - E['PR KDE']









    Out[37]:





7d     0.026698
12d    1.206923
17d   -0.460290
22d    0.118158
27d    0.452226
dtype: float64

NOTE: Fitting $d_T$ to match $E$ from corrected histograms with $E$ from PR correction formula produces a max difference of 1% for the 12d sample. The match is well below the fitting accuracy (> 2%).

Save



In [38]:

    
E.to_csv('results/usALEX-5samples-E-all-methods.csv', float_format='%.3f')



In [39]:

    
E.round(3)



In [40]:

    
Eerr.to_csv('results/usALEX-5samples-E-all-methods_errors.csv', float_format='%.4f')

	<E>	SDV(E)	Median	Mode
Sample
7d	0.9125	0.0450	0.919782	0.935147
12d	0.7250	0.0675	0.728564	0.736041
17d	0.4225	0.0800	0.421083	0.418087
22d	0.1850	0.0350	0.183270	0.179754
27d	0.0800	0.0000	0.080000	0.080000

	SNA_E_mean	SNA_E_max
Sample
7d	0.9125	0.9351
12d	0.7250	0.7360
17d	0.4225	0.4181
22d	0.1850	0.1798
27d	0.0800	0.0800

	E_gauss_w	E_kde_w	E_gauss_w_err	E_gauss_w_fiterr	n_bursts_all	n_bursts_fret
sample
7d	0.927817	0.9338	0.002130	0.002408	1654	641
12d	0.748896	0.7508	0.002467	0.001494	1692	1157
17d	0.480670	0.4906	0.002001	0.002126	3459	2588
22d	0.280392	0.2752	0.001454	0.001775	2798	2181
27d	0.194729	0.1944	0.002032	0.001035	1146	792

	E_gauss_w	E_kde_w	E_gauss_w_err	E_gauss_w_fiterr	n_bursts_all	n_bursts_fret
sample
7d	0.924307	0.9290	0.002583	0.001782	1172	542
12d	0.727043	0.7398	0.002990	0.001995	1307	948
17d	0.422624	0.4268	0.002549	0.002008	2489	1964
22d	0.181067	0.1774	0.001888	0.001536	2054	1672
27d	0.083971	0.0826	0.002856	0.001585	791	589

	E_alex	E_datasheet
sample
7d	92.90	92.484823
12d	73.98	71.288927
17d	42.68	40.040846
22d	17.74	13.131377
27d	8.26	2.782340

	E KDE	E Gauss	PR KDE	PR Gauss	SNA E mean	SNA E max
7d	92.90	92.4307	92.873302	92.225478	91.25	93.51
12d	73.98	72.7043	72.773077	72.560799	72.50	73.60
17d	42.68	42.2624	43.140290	41.983996	42.25	41.81
22d	17.74	18.1067	17.621842	18.247880	18.50	17.98
27d	8.26	8.3971	7.807774	7.848009	8.00	8.00

	E_gauss_w_err	E_gauss_w_fiterr	PR_gauss_w_err	PR_gauss_w_fiterr	SNA_SDV
sample
7d	0.2583	0.1782	0.2130	0.2408	4.50
12d	0.2990	0.1995	0.2467	0.1494	6.75
17d	0.2549	0.2008	0.2001	0.2126	8.00
22d	0.1888	0.1536	0.1454	0.1775	3.50
27d	0.2856	0.1585	0.2032	0.1035	0.00

	E KDE	E Gauss	PR KDE	PR Gauss	SNA E mean	SNA E max
7d	92.90	92.431	92.873	92.225	91.25	93.51
12d	73.98	72.704	72.773	72.561	72.50	73.60
17d	42.68	42.262	43.140	41.984	42.25	41.81
22d	17.74	18.107	17.622	18.248	18.50	17.98
27d	8.26	8.397	7.808	7.848	8.00	8.00